Goto

Collaborating Authors

 key pose


An Interpretable Neural Control Network with Adaptable Online Learning for Sample Efficient Robot Locomotion Learning

Srisuchinnawong, Arthicha, Manoonpong, Poramate

arXiv.org Artificial Intelligence

Robot locomotion learning using reinforcement learning suffers from training sample inefficiency and exhibits the non-understandable/black-box nature. Thus, this work presents a novel SME-AGOL to address such problems. Firstly, Sequential Motion Executor (SME) is a three-layer interpretable neural network, where the first produces the sequentially propagating hidden states, the second constructs the corresponding triangular bases with minor non-neighbor interference, and the third maps the bases to the motor commands. Secondly, the Adaptable Gradient-weighting Online Learning (AGOL) algorithm prioritizes the update of the parameters with high relevance score, allowing the learning to focus more on the highly relevant ones. Thus, these two components lead to an analyzable framework, where each sequential hidden state/basis represents the learned key poses/robot configuration. Compared to state-of-the-art methods, the SME-AGOL requires 40% fewer samples and receives 150% higher final reward/locomotion performance on a simulated hexapod robot, while taking merely 10 minutes of learning time from scratch on a physical hexapod robot. Taken together, this work not only proposes the SME-AGOL for sample efficient and understandable locomotion learning but also emphasizes the potential exploitation of interpretability for improving sample efficiency and learning performance.


ImitationNet: Unsupervised Human-to-Robot Motion Retargeting via Shared Latent Space

Yan, Yashuai, Mascaro, Esteve Valls, Lee, Dongheui

arXiv.org Artificial Intelligence

This paper introduces a novel deep-learning approach for human-to-robot motion retargeting, enabling robots to mimic human poses accurately. Contrary to prior deep-learning-based works, our method does not require paired human-to-robot data, which facilitates its translation to new robots. First, we construct a shared latent space between humans and robots via adaptive contrastive learning that takes advantage of a proposed cross-domain similarity metric between the human and robot poses. Additionally, we propose a consistency term to build a common latent space that captures the similarity of the poses with precision while allowing direct robot motion control from the latent space. For instance, we can generate in-between motion through simple linear interpolation between two projected human poses. We conduct a comprehensive evaluation of robot control from diverse modalities (i.e., texts, RGB videos, and key poses), which facilitates robot control for non-expert users. Our model outperforms existing works regarding human-to-robot retargeting in terms of efficiency and precision. Finally, we implemented our method in a real robot with self-collision avoidance through a whole-body controller to showcase the effectiveness of our approach. More information on our website https://evm7.github.io/UnsH2R/


Nonverbal Social Behavior Generation for Social Robots Using End-to-End Learning

Ko, Woo-Ri, Jang, Minsu, Lee, Jaeyeon, Kim, Jaehong

arXiv.org Artificial Intelligence

To provide effective and enjoyable human-robot interaction, it is important for social robots to exhibit nonverbal behaviors, such as a handshake or a hug. However, the traditional approach of reproducing pre-coded motions allows users to easily predict the reaction of the robot, giving the impression that the robot is a machine rather than a real agent. Therefore, we propose a neural network architecture based on the Seq2Seq model that learns social behaviors from human-human interactions in an end-to-end manner. We adopted a generative adversarial network to prevent invalid pose sequences from occurring when generating long-term behavior. To verify the proposed method, experiments were performed using the humanoid robot Pepper in a simulated environment. Because it is difficult to determine success or failure in social behavior generation, we propose new metrics to calculate the difference between the generated behavior and the ground-truth behavior. We used these metrics to show how different network architectural choices affect the performance of behavior generation, and we compared the performance of learning multiple behaviors and that of learning a single behavior. We expect that our proposed method can be used not only with home service robots, but also for guide robots, delivery robots, educational robots, and virtual robots, enabling the users to enjoy and effectively interact with the robots.


FP-Loc: Lightweight and Drift-free Floor Plan-assisted LiDAR Localization

Gao, Ling, Kneip, Laurent

arXiv.org Artificial Intelligence

We present a novel framework for floor plan-based, full six degree-of-freedom LiDAR localization. Our approach relies on robust ceiling and ground plane detection, which solves part of the pose and supports the segmentation of vertical structure elements such as walls and pillars. Our core contribution is a novel nearest neighbour data structure for an efficient look-up of nearest vertical structure elements from the floor plan. The registration is realized as a pair-wise regularized windowed pose graph optimization. Highly efficient, accurate and drift-free long-term localization is demonstrated on multiple scenes.


DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer

Li, Buyu, Zhao, Yongchi, Sheng, Lu

arXiv.org Artificial Intelligence

In this work, we propose a novel deep learning framework that can generate a vivid dance from a whole piece of music. In contrast to previous works that define the problem as generation of frames of motion state parameters, we formulate the task as a prediction of motion curves between key poses, which is inspired by the animation industry practice. The proposed framework, named DanceNet3D, first generates key poses on beats of the given music and then predicts the in-between motion curves. DanceNet3D adopts the encoder-decoder architecture and the adversarial schemes for training. The decoders in DanceNet3D are constructed on MoTrans, a transformer tailored for motion generation. In MoTrans we introduce the kinematic correlation by the Kinematic Chain Networks, and we also propose the Learned Local Attention module to take the temporal local correlation of human motion into consideration. Furthermore, we propose PhantomDance, the first large-scale dance dataset produced by professional animatiors, with accurate synchronization with music. Extensive experiments demonstrate that the proposed approach can generate fluent, elegant, performative and beat-synchronized 3D dances, which significantly surpasses previous works quantitatively and qualitatively.